-
Notifications
You must be signed in to change notification settings - Fork 250
Track variable locations #2471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: next
Are you sure you want to change the base?
Track variable locations #2471
Conversation
|
Goes on top of #2443. |
e52ad69 to
da325b4
Compare
|
In order to try everything within compiler, I have used the |
| /// If this is a function parameter, its 1-based index | ||
| arg_index: Option<u32>, | ||
| /// Source file location (file:line:column) | ||
| location: Option<FileLineCol>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is needed, as mentioned in my other comment - but let me know if this is only intended to be set in cases where the location differs from that of the associated instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add comment here to make it clear.
| @@ -273,6 +272,7 @@ pub enum Instruction { | |||
| // ----- debug decorators -------------------------------------------------------------------- | |||
| Breakpoint, | |||
| Debug(DebugOptions), | |||
| DebugVar(miden_core::DebugVarInfo), | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this should be added as an instruction - only a decorator, just like AssemblyOp.
That said, currently, the AST/assembler doesn't provide a way to attach decorators that don't correspond to an instruction to the AST before assembly.
So - I think the best way to handle this is to support Instruction variants like DebugVar, but do not pretty-print them like instructions, and do not add support in the grammar for parsing them. The compiler will then have an easy way to emit these decorators, but they don't end up getting treated like normal instructions.
I would suggest we add a method on Instruction called has_textual_representation which matches on the instruction variant and returns true if it matches any variant other than those like DebugVar which are not present in the textual format of MASM. I'm happy to bikeshed the name too, but has_textual_representation feels like the most precise way to name it without it unintentionally becoming used for something with different semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree!
| /// Path to the source file (index into string table) | ||
| pub path_idx: u32, | ||
| /// Optional directory containing the file (index into string table) | ||
| pub directory_idx: Option<u32>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this redundant with path_idx, i.e. if we have the path of the source file, then that presumably includes any directory information already, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was intending to kinda match with what other debug info representations do (like DWARF or CodeView), where it tries to keep the base/common root dir shared among other relative paths depending on that root, but since in miden(c) we always operate with full paths, we can simplify this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we keep it, I'd make a note that when directory_idx is set, then path_idx is a relative path, otherwise path_idx is expected to be absolute.
We will probably want to explore splitting paths into those components in the future, so we can further reduce the size of the serialized debug info, but I'd treat that as lower priority for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a TODO marker for this in 7bbd988, so we can introduce these fields later.
| /// Column where the variable is declared | ||
| pub column: u32, | ||
| /// Scope depth (0 = function scope) | ||
| pub scope_depth: u32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be interested to understand how one can leverage a non-zero scope_depth in practice, i.e. how is it used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is a variable with the same name in a nested/inner block debugger needs to differentiate between that one and the one from outer scope. I will add an example in form of comment here in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see - in the debugger we only need to care about definitions at the current depth (and above, i.e. lower depth) for purposes of being able to show variable information at the current program point. We don't need to worry about conflicts at the same depth, since only one variable definition at that depth can exist, even if there are multiple scopes with that depth in the current function.
Definitely a good idea to document those semantics here I think, but at least the usefulness is clear to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the clarification!
processor/src/fast/mod.rs
Outdated
| let process = &mut self.state(); | ||
| host.on_trace(process, *id)?; | ||
| }, | ||
| Decorator::DebugVar(_debug_var) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we keep this behavior, I think I would reword this as:
DebugVar is strictly a metadata-carrying decorator used by downstream tooling, i.e. it is non-actionable in the processor itself.
That said, IMO we need some way for the debugger to be notified when variable information is found/changes, so that it can update its internal state accordingly, otherwise we will be forced to step every instruction manually to check for this decorator in case that information changes. I'd prefer not to have to do that.
IMO, we should manifest this either as a raw trace event, modify on_debug to support triggering that callback for this decorator, or add a new callback that is specific to DebugVar. Whichever approach we choose, we'd only invoke the callback when tracing is enabled in the processor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't speak to whether this decorator is the right approach for the debugger. Though with #2541 we want to move away from representing AsmOp (and now DebugVar too) as decorators that can be visited needlessly by the processor during execution.
Though for now this is the best we can do, provided that the DebugVar decorator is the right approach for the debugger.
Note: is this a feature that would only be accessible for programs written in Rust, or also from MASM? Or in other words, would there be fewer features made available by the debugger for debugging programs written in MASM compared to programs written in Rust?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: is this a feature that would only be accessible for programs written in Rust, or also from MASM? Or in other words, would there be fewer features made available by the debugger for debugging programs written in MASM compared to programs written in Rust?
If there is a way of using a concept of "variable" at MASM level we can create a decorator for it -- since this is not Rust specific, but rather an abstraction. However, it needs to have a name, type..., so debugger can decode its value properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@plafer Since MASM doesn't have variables, this decorator wouldn't ever be produced from Miden Assembly sources. The closest we have are procedure locals, but those aren't variables in any practical sense; rather they collectively represent a fixed-size region of automatically-allocated, procedure-scoped memory, so I don't think it would make sense to use DebugVar for them - at least not without some MASM syntax for local variables that gives them a name and type, and is then used to derive the size and placement of procedure locals needed for that procedure.
But, as @djolertrk noted, this can be used for any source language with variables, it isn't Rust-specific.
On the topic of debug decorators like this and AssemblyOp, I'd be happy to replace them with something that doesn't rely on the decorator mechanism at all, but I'm not actually sure what that would be. All we really need is a way to map a MAST node to the debug information that applies to it, ideally with a way to disambiguate the debug information based on context (e.g. current call stack). AFAIK, that's essentially how decorators work now though (sans the disambiguation bit), so it isn't clear to me what we gain or lose by adding decorators like this. Definitely open to figuring out how best to handle this sort of thing going forward though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With #2541 we basically already have a plan to move out any current "decorator" which only "provides metadata" (TLDR they get their own separate CSR matrix). But since that doesn't exist yet, we can just add the DebugVar as a decorator for now (as-is in this PR), and we'll pull it out along with AsmOp when we address #2541.
bitwalker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks pretty good! See my comments for more specific feedback on things I think we may need to change.
One thing that I think may be worth adding to miden_mast_package as part of this, is a higher-level type or set of types which reify the encoded representation of the debug information into usable form that has been validated and can be directly queried by tools such as the debugger.
As it stands, what you get from deserializing the section is still a very raw form (i.e. references to other data in the section is not materialized/validated). That is useful as a low-level primitive, but means that any tooling that wants to actually use the information encoded in the package has to build that high-level representation themselves. While in some cases that may be necessary/desirable, I suspect that in practice, a single canonical representation defined in miden_mast_package would be sufficient for the uses we have in mind.
My only other note is that I'd like to see a bit more documentation on what the various bits of information are used for, how to put them together. This would of course be one thing solved by providing a higher-level representation of the debug info, but even then there are some bits of information that are somewhat unclear on what their usage should be (e.g. my comment on scope_depth).
Ultimately, one of the things I'd want the high-level representation to provide, would be a set of API functions that correspond to important debugger queries, without the debugger needing to know too much about the internal encoding of the debug info provided by a Package.
Thanks a lot for your comments! I have rebased this on top of the latest next branch, and addressed all comments. My comment regarding API for the debugger -- I have left a comment with a TODO marker, so we can pick it up once compiler side is ready and when we can have hands-on on the consumer/debugger side. Please do note that I have added a couple of documents in the 0xMiden/compiler#820 that describe the debug info design at both IRs level and in the final package. |
- Add DebugVarInfo and DebugVarLocation
- Add Debug info types:
- DebugInfoSection (main container)
- DebugTypeInfo - Type definitions (Primitive, Pointer, Array, Struct, Function, Unknown)
- DebugPrimitiveType - 16 primitive types (Void through Word)
- DebugFileInfo
- DebugFunctionInfo
- DebugVariableInfo
- DebugInlinedCallInfo (for Inlined call site tracking)
- debug_info/serialization.rs - Full serialization/deserialization of
the debug info section
3289e5e to
193b063
Compare
|
As you can see in the 0xMiden/compiler#745, I have back-ported these commits from miden-vm based on "next" to a branch based on "miden-vm v19.0.1" (https://github.com/walnuthq/miden-vm/tree/feature/debug-variable-locations-v0.19.1) so I can test it all together with compiler and debugger, so the latest commits here represent debugger API that is tested with 0xMiden/miden-debug#17. |
6774b84 to
132a8c6
Compare
- Delete accidentally committed ._target file, add ._* to .gitignore - Change DebugVarInfo.name from String to Arc<str> for sharing references - Change DebugVarInfo.arg_index to Option<NonZeroU32> - Change DebugVarLocation::Const to use Felt instead of u64 - Add Instruction::has_textual_representation() method for DebugVar - Update print.rs to return Document::Empty for non-textual instructions - Add documentation explaining location should only differ from AssemblyOp - Add documentation for directory_idx explaining relative path usage - Use LineNumber/ColumnNumber types in mast-package debug_info types - Document scope_depth usage with detailed examples - Update processor callback comments for DebugVar decorator - Add miden-debug-types dependency to mast-package - Simplify DebugFileInfo to use full paths only
Add a new callback mechanism for DebugVar decorators to enable debuggers to track source-level variable information during execution. This allows debuggers like miden-debug to receive notifications when variable location information changes, enabling features like displaying local variables and their values during debugging sessions.
- Changed Local(u16) to Local(i16) to store signed FMP offset directly - The offset is computed as: idx - num_locals - Updated Display format - Updated serialization to handle i16 - Address is computed as: FMP + offset
132a8c6 to
f0ae386
Compare
Handle
DebugVardecorator and newdebug_infosection in the final .masp.